Porto Alegre
Federated Learning Framework for Scalable AI in Heterogeneous HPC and Cloud Environments
Ghimire, Sangam, Timalsina, Paribartan, Bhurtel, Nirjal, Neupane, Bishal, Shrestha, Bigyan Byanju, Bhattarai, Subarna, Gaire, Prajwal, Thapa, Jessica, Jha, Sudan
As AI models continue to grow in complexity and size, so does the demand for vast computational resources and access to large-scale distributed datasets. At the same time, growing concerns about data privacy, ownership, and regulatory compliance make it increasingly difficult to centralize data for training. FL has emerged as a promising paradigm for addressing these challenges, enabling the training of collaborative models across multiple data silos without requiring the raw data to leave its source. While FL has gained traction in mobile and edge environments, such as smart-phones and IoT devices, its application in large-scale computing platforms like HPC clusters and cloud infrastructure remains underexplored. Meanwhile, the convergence of HPC and cloud computing is reshaping the landscape of modern data-intensive applications. These hybrid environments combine the raw power and efficiency of HPC with the scalability and flexibility of the cloud, making them well-suited for training large AI models. However, this integration brings new challenges: heterogeneous hardware (e.g., Central Processing Units (CPUs), Graphics Processing Units (GPUs), Tensor Processing Units (TPUs)), inconsistent network performance, dynamic resource availability, and non-uniform data distributions across clients. In this context, the deployment of federated learning across such mixed infrastructure is both a timely opportunity and a technical challenge. This paper explores how FL can be adapted and optimized to run efficiently across heterogeneous HPC and cloud environments, with a focus on scalability, system resilience, and performance under non-IID data conditions.
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- (2 more...)
- Information Technology > Services (1.00)
- Information Technology > Security & Privacy (1.00)
Synthetic Data: AI's New Weapon Against Android Malware
Nogueira, Angelo Gaspar Diniz, Paim, Kayua Oleques, Bragança, Hendrio, Mansilha, Rodrigo Brandão, Kreutz, Diego
The ever-increasing number of Android devices and the accelerated evolution of malware, reaching over 35 million samples by 2024, highlight the critical importance of effective detection methods. Attackers are now using Artificial Intelligence to create sophisticated malware variations that can easily evade traditional detection techniques. Although machine learning has shown promise in malware classification, its success relies heavily on the availability of up-to-date, high-quality datasets. The scarcity and high cost of obtaining and labeling real malware samples presents significant challenges in developing robust detection models. In this paper, we propose MalSynGen, a Malware Synthetic Data Generation methodology that uses a conditional Generative Adversarial Network (cGAN) to generate synthetic tabular data. This data preserves the statistical properties of real-world data and improves the performance of Android malware classifiers. We evaluated the effectiveness of this approach using various datasets and metrics that assess the fidelity of the generated data, its utility in classification, and the computational efficiency of the process. Our experiments demonstrate that MalSynGen can generalize across different datasets, providing a viable solution to address the issues of obsolescence and low quality data in malware detection. With approximately 3 billion Android devices in operation worldwide [1], the mobile cybersecurity landscape faces formidable challenges. In 2024 alone, Kaspersky reported over 33.3 million cyberattacks targeting smartphone users globally, encompassing diverse forms of malware and unwanted software [2]. Adding to this problem, attackers are using Artificial Intelligence (AI) to rapidly generate new malware variants by exploiting patterns learned from existing malware [3].
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.05)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.68)
Reducing Instability in Synthetic Data Evaluation with a Super-Metric in MalDataGen
da Silva, Anna Luiza Gomes, Kreutz, Diego, Diniz, Angelo, Mansilha, Rodrigo, da Fonseca, Celso Nobre
Evaluating the quality of synthetic data remains a persistent challenge in the Android malware domain due to instability and the lack of standardization among existing metrics. Experiments involving ten generative models and five balanced datasets demonstrate that the Super-Metric is more stable and consistent than traditional metrics, exhibiting stronger correlations with the actual performance of classifiers. Synthetic data generation has become an increasingly relevant strategy in cybersecurity [1], [2], [3], particularly as a way to mitigate the scarcity of real, complete, and high-quality datasets that limit the performance and generalization of machine learning models. Despite these advances, assessing the quality of synthetic data remains a complex and largely non-standardized methodological challenge [4], with no clear consensus on which metrics should be used or how to combine them consistently. The literature reports a significant fragmentation in the application of fidelity metrics, with studies identifying more than 65 distinct indicators used independently to assess fidelity [5]. This diversity hinders model-to-model comparison, reduces experimental reproducibility, and complicates the integrated interpretation of data quality.
SpellForger: Prompting Custom Spell Properties In-Game using BERT supervised-trained model
Silva, Emanuel C., Salum, Emily S. M., Arantes, Gabriel M., Pereira, Matheus P., Oliveira, Vinicius F., Bicho, Alessandro L.
Introduction: The application of Artificial Intelligence in games has evolved significantly, allowing for dynamic content generation. However, its use as a core gameplay co-creation tool remains underexplored. Objective: This paper proposes SpellForger, a game where players create custom spells by writing natural language prompts, aiming to provide a unique experience of personalization and creativity. Methodology: The system uses a supervised-trained BERT model to interpret player prompts. This model maps textual descriptions to one of many spell prefabs and balances their parameters (damage, cost, effects) to ensure competitive integrity. The game is developed in the Unity Game Engine, and the AI backend is in Python. Expected Results: W e expect to deliver a functional prototype that demonstrates the generation of spells in real time, applied to an engaging gameplay loop, where player creativity is central to the experience, validating the use of AI as a direct gameplay mechanic.
- South America > Brazil > Bahia > Salvador (0.06)
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.05)
Sabiá: Um Chatbot de Inteligência Artificial Generativa para Suporte no Dia a Dia do Ensino Superior
Rodrigues, Guilherme Biava, Beal, Franciele, Marcon, Marlon, Souza, Alinne Cristinne Corrêa, Ortoncelli, André Roberto, Souza, Francisco Carlos Monteiro, Silva, Rodolfo Adamshuk
Students often report difficulties in accessing day-to-day academic information, which is usually spread across numerous institutional documents and websites. This fragmentation results in a lack of clarity and causes confusion about routine university information. This project proposes the development of a chatbot using Generative Artificial Intelligence (GenAI) and Retrieval-Augmented Generation (RAG) to simplify access to such information. Several GenAI models were tested and evaluated based on quality metrics and the LLM-as-a-Judge approach. Among them, Gemini 2.0 Flash stood out for its quality and speed, and Gemma 3n for its good performance and open-source nature.
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- North America > United States (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
Constructing an Optimal Behavior Basis for the Option Keyboard
Alegre, Lucas N., Bazzan, Ana L. C., Barreto, André, da Silva, Bruno C.
Multi-task reinforcement learning aims to quickly identify solutions for new tasks with minimal or no additional interaction with the environment. Generalized Policy Improvement (GPI) addresses this by combining a set of base policies to produce a new one that is at least as good -- though not necessarily optimal -- as any individual base policy. Optimality can be ensured, particularly in the linear-reward case, via techniques that compute a Convex Coverage Set (CCS). However, these are computationally expensive and do not scale to complex domains. The Option Keyboard (OK) improves upon GPI by producing policies that are at least as good -- and often better. It achieves this through a learned meta-policy that dynamically combines base policies. However, its performance critically depends on the choice of base policies. This raises a key question: is there an optimal set of base policies -- an optimal behavior basis -- that enables zero-shot identification of optimal solutions for any linear tasks? We solve this open problem by introducing a novel method that efficiently constructs such an optimal behavior basis. We show that it significantly reduces the number of base policies needed to ensure optimality in new tasks. We also prove that it is strictly more expressive than a CCS, enabling particular classes of non-linear tasks to be solved optimally. We empirically evaluate our technique in challenging domains and show that it outperforms state-of-the-art approaches, increasingly so as task complexity increases.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- Europe > Austria > Vienna (0.14)
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > Promising Solution (0.86)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Taskmaster Deconstructed: A Quantitative Look at Tension, Volatility, and Viewer Ratings
Taskmaster is a British television show that combines comedic performance with a formal scoring system. Despite the appearance of structured competition, it remains unclear whether scoring dynamics contribute meaningfully to audience engagement. We conducted a statistical analysis of 162 episodes across 18 series, using fifteen episode-level metrics to quantify rank volatility, point spread, lead changes, and winner dominance. None of these metrics showed a significant association with IMDb ratings, even after controlling for series effects. Long-term trends suggest that average points have increased over time, while volatility has slightly declined and rank spread has remained stable. These patterns indicate an attempt to enhance competitive visibility without altering the show's structural equilibrium. We also analyzed contestant rank trajectories and identified five recurring archetypes describing performance styles. These patterns suggest that viewer interest is shaped more by contestant behavior than by game mechanics.
- North America > United States > New York (0.05)
- Europe > Ireland (0.04)
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- (2 more...)
- Leisure & Entertainment (1.00)
- Media > Television (0.68)
- Media > Film (0.46)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Communications > Social Media (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Usando LLMs para Programar Jogos de Tabuleiro e Variações
Becker, Álvaro Guglielmin, Rossato, Lana Bertoldo, Tavares, Anderson Rocha
Creating programs to represent board games can be a time-consuming task. Large Language Models (LLMs) arise as appealing tools to expedite this process, given their capacity to efficiently generate code from simple contextual information. In this work, we propose a method to test how capable three LLMs (Claude, DeepSeek and ChatGPT) are at creating code for board games, as well as new variants of existing games.
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Boardwalk: Towards a Framework for Creating Board Games with LLMs
Becker, Álvaro Guglielmin, de Oliveira, Gabriel Bauer, Rossato, Lana Bertoldo, Tavares, Anderson Rocha
Implementing board games in code can be a time-consuming task. However, Large Language Models (LLMs) have been proven effective at generating code for domain-specific tasks with simple contextual information. We aim to investigate whether LLMs can implement digital versions of board games from rules described in natural language. This would be a step towards an LLM-assisted framework for quick board game code generation. We expect to determine the main challenges for LLMs to implement the board games, and how different approaches and models compare to one another. We task three state-of-the-art LLMs (Claude, DeepSeek and ChatGPT) with coding a selection of 12 popular and obscure games in free-form and within Boardwalk, our proposed General Game Playing API. We anonymize the games and components to avoid evoking pre-trained LLM knowledge. The implementations are tested for playability and rule compliance. We evaluate success rate and common errors across LLMs and game popularity. Our approach proves viable, with the best performing model, Claude 3.7 Sonnet, yielding 55.6\% of games without any errors. While compliance with the API increases error frequency, the severity of errors is more significantly dependent on the LLM. We outline future steps for creating a framework to integrate this process, making the elaboration of board games more accessible.
- South America > Brazil > Bahia > Salvador (0.06)
- North America > United States (0.04)
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- (2 more...)
MH-1M: A 1.34 Million-Sample Comprehensive Multi-Feature Android Malware Dataset for Machine Learning, Deep Learning, Large Language Models, and Threat Intelligence Research
Braganca, Hendrio, Kreutz, Diego, Rocha, Vanderson, Assolin, Joner, Feitosa, and Eduardo
Abstract--We present MH-1M, one of the most comprehensive and up-to-date datasets for advanced Android malware research. The dataset comprises 1,340,515 applications, encompassing a wide range of features and extensive metadata. T o ensure accurate malware classification, we employ the VirusT otal API, integrating multiple detection engines for comprehensive and reliable assessment. Our GitHub, Figshare, and Harvard Dataverse repositories provide open access to the processed dataset and its extensive supplementary metadata, totaling more than 400 GB of data and including the outputs of the feature extraction pipeline as well as the corresponding VirusT otal reports. Our findings underscore the MH-1M dataset's invaluable role in understanding the evolving landscape of malware. The pervasive spread of Android malware poses a significant challenge for cybersecurity research. This challenge stems mainly from the open-source nature and affordability of Android platforms, which grant users access to a large market of free applications. At the same time, malware continually evolves, adapting its tactics to execute more sophisticated and frequent attacks. Such attacks often result in data destruction, information theft, and several other cybercrimes [1], [2], [3]. Machine learning (ML) algorithms have been widely used to uncover malware and have demonstrated remarkable effectiveness in detection systems, leveraging their discriminative capabilities to identify new variants of malicious applications [4], [5], [6]. To mitigate these risks, researchers have developed a variety of methods for detecting Android malware, establishing machine learning as a central focus of contemporary mobile security research [7], [8], [9]. However, the effectiveness of ML models is highly dependent on the quality of the datasets used for training. Many existing datasets suffer from limitations such as outdated data, inadequate representation, and a limited number of samples and features, making them unsuitable for modern malware detection [10], [2], [11], [12]. These issues raise concerns about the reliability of reported performance metrics and can potentially lead to misleading conclusions [2]. A growing body of research in Android malware detection strongly supports the notion that increasing the number of discriminative features can significantly improve classification performance [13], [14], [15]. We present in Table I an overview of widely used Android malware datasets from recent years.
- North America > United States (0.04)
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)